{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "-ETtu9CvVMDR"
   },
   "source": [
    "<h1>Chapter 6 - Prompt Engineering</h1>\n",
    "<i>Methods for improving the output through prompt engineering.</i>\n",
    "\n",
    "<a href=\"https://www.amazon.com/Hands-Large-Language-Models-Understanding/dp/1098150961\"><img src=\"https://img.shields.io/badge/Buy%20the%20Book!-grey?logo=amazon\"></a>\n",
    "<a href=\"https://www.oreilly.com/library/view/hands-on-large-language/9781098150952/\"><img src=\"https://img.shields.io/badge/O'Reilly-white.svg?logo=\"></a>\n",
    "<a href=\"https://github.com/HandsOnLLM/Hands-On-Large-Language-Models\"><img src=\"https://img.shields.io/badge/GitHub%20Repository-black?logo=github\"></a>\n",
    "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/HandsOnLLM/Hands-On-Large-Language-Models/blob/main/chapter06/Chapter%206%20-%20Prompt%20Engineering.ipynb)\n",
    "\n",
    "---\n",
    "\n",
    "This notebook is for Chapter 6 of the [Hands-On Large Language Models](https://www.amazon.com/Hands-Large-Language-Models-Understanding/dp/1098150961) book by [Jay Alammar](https://www.linkedin.com/in/jalammar) and [Maarten Grootendorst](https://www.linkedin.com/in/mgrootendorst/).\n",
    "\n",
    "---\n",
    "\n",
    "<a href=\"https://www.amazon.com/Hands-Large-Language-Models-Understanding/dp/1098150961\">\n",
    "<img src=\"https://raw.githubusercontent.com/HandsOnLLM/Hands-On-Large-Language-Models/main/images/book_cover.png\" width=\"350\"/></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### [OPTIONAL] - Installing Packages on <img src=\"https://colab.google/static/images/icons/colab.png\" width=100>\n",
    "\n",
    "If you are viewing this notebook on Google Colab (or any other cloud vendor), you need to **uncomment and run** the following codeblock to install the dependencies for this chapter:\n",
    "\n",
    "---\n",
    "\n",
    "💡 **NOTE**: We will want to use a GPU to run the examples in this notebook. In Google Colab, go to\n",
    "**Runtime > Change runtime type > Hardware accelerator > GPU > GPU type > T4**.\n",
    "\n",
    "---\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# %%capture\n",
    "# !pip install langchain>=0.1.17 openai>=1.13.3 langchain_openai>=0.1.6 transformers>=4.40.1 datasets>=2.18.0 accelerate>=0.27.2 sentence-transformers>=2.5.1 duckduckgo-search>=5.2.2\n",
    "# !CMAKE_ARGS=\"-DLLAMA_CUBLAS=on\" pip install llama-cpp-python"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "KaB0B1LdMcPM"
   },
   "source": [
    "## Loading our model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 269,
     "referenced_widgets": [
      "977271302f104ebdb11c0a2a8183cc93",
      "931c6ca8360a4e79924399f017f5522d",
      "2ff276f9bbab440bb036c1109c023582",
      "6d5243b8997d4f64af62900cb19489d0",
      "680d3647069b45c0b4353548ac52ef37",
      "0512bf845dda462395f58c06dfb0733d",
      "26985bad36884566a7bd4ba6f1037e80",
      "e174502c4dfe4740922264d67a9a74b5",
      "9da5552b33c1418d9e0190ec66f19c9d",
      "33acfbdb950545a6a67a9b39be02e30b",
      "c6d9399532c14ce58189b9d038e4cfa4"
     ]
    },
    "executionInfo": {
     "elapsed": 58517,
     "status": "ok",
     "timestamp": 1715083822638,
     "user": {
      "displayName": "Maarten Grootendorst",
      "userId": "11015108362723620659"
     },
     "user_tz": -120
    },
    "id": "mNkbw28oMeAM",
    "outputId": "c0b97b59-180f-427f-8d49-398cb1b39c40"
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.\n",
      "  warnings.warn(\n",
      "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_token.py:89: UserWarning: \n",
      "The secret `HF_TOKEN` does not exist in your Colab secrets.\n",
      "To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.\n",
      "You will be able to reuse this secret in all of your notebooks.\n",
      "Please note that authentication is recommended but still optional to access public models or datasets.\n",
      "  warnings.warn(\n",
      "WARNING:transformers_modules.microsoft.Phi-3-mini-4k-instruct.920b6cf52a79ecff578cc33f61922b23cbc88115.modeling_phi3:`flash-attention` package not found, consider installing for better performance: No module named 'flash_attn'.\n",
      "WARNING:transformers_modules.microsoft.Phi-3-mini-4k-instruct.920b6cf52a79ecff578cc33f61922b23cbc88115.modeling_phi3:Current `flash-attenton` does not support `window_size`. Either upgrade or use `attn_implementation='eager'`.\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "977271302f104ebdb11c0a2a8183cc93",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.\n"
     ]
    }
   ],
   "source": [
    "import torch\n",
    "from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline\n",
    "\n",
    "# Load model and tokenizer\n",
    "model = AutoModelForCausalLM.from_pretrained(\n",
    "    \"microsoft/Phi-3-mini-4k-instruct\",\n",
    "    device_map=\"cuda\",\n",
    "    torch_dtype=\"auto\",\n",
    "    trust_remote_code=False,\n",
    ")\n",
    "tokenizer = AutoTokenizer.from_pretrained(\"microsoft/Phi-3-mini-4k-instruct\")\n",
    "\n",
    "# Create a pipeline\n",
    "pipe = pipeline(\n",
    "    \"text-generation\",\n",
    "    model=model,\n",
    "    tokenizer=tokenizer,\n",
    "    return_full_text=False,\n",
    "    max_new_tokens=500,\n",
    "    do_sample=False,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "executionInfo": {
     "elapsed": 3374,
     "status": "ok",
     "timestamp": 1715083826011,
     "user": {
      "displayName": "Maarten Grootendorst",
      "userId": "11015108362723620659"
     },
     "user_tz": -120
    },
    "id": "uZIMVL0Q3g8P",
    "outputId": "2df55f98-fcf7-448e-b4de-3ae1ed8c1a5f"
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "WARNING:transformers_modules.microsoft.Phi-3-mini-4k-instruct.920b6cf52a79ecff578cc33f61922b23cbc88115.modeling_phi3:You are not running the flash-attention implementation, expect numerical differences.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " Why don't chickens like to go to the gym? Because they can't crack the egg-sistence of it!\n"
     ]
    }
   ],
   "source": [
    "# Prompt\n",
    "messages = [\n",
    "    {\"role\": \"user\", \"content\": \"Create a funny joke about chickens.\"}\n",
    "]\n",
    "\n",
    "# Generate the output\n",
    "output = pipe(messages)\n",
    "print(output[0][\"generated_text\"])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "executionInfo": {
     "elapsed": 8,
     "status": "ok",
     "timestamp": 1715083826011,
     "user": {
      "displayName": "Maarten Grootendorst",
      "userId": "11015108362723620659"
     },
     "user_tz": -120
    },
    "id": "YOGpAY8w4odt",
    "outputId": "2cc5de39-2a60-4ec4-9894-72b7b95a2fbb"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<s><|user|>\n",
      "Create a funny joke about chickens.<|end|>\n",
      "<|assistant|>\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# Apply prompt template\n",
    "prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False)\n",
    "print(prompt)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "executionInfo": {
     "elapsed": 3497,
     "status": "ok",
     "timestamp": 1715083829501,
     "user": {
      "displayName": "Maarten Grootendorst",
      "userId": "11015108362723620659"
     },
     "user_tz": -120
    },
    "id": "eLd_XXaR3i04",
    "outputId": "fa46605a-6e0f-4647-d725-b2ee5f2528b7"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " Why don't chickens ever play hide and seek? Because good luck hiding when everyone always goes to the henhouse!\n"
     ]
    }
   ],
   "source": [
    "# Using a high temperature\n",
    "output = pipe(messages, do_sample=True, temperature=1)\n",
    "print(output[0][\"generated_text\"])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "executionInfo": {
     "elapsed": 1813,
     "status": "ok",
     "timestamp": 1715083831313,
     "user": {
      "displayName": "Maarten Grootendorst",
      "userId": "11015108362723620659"
     },
     "user_tz": -120
    },
    "id": "sz61Fvtk580U",
    "outputId": "38839183-60f7-49cc-f695-11841d116a7e"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " Why don't chickens like math class? Because they can't solve for \"x\" in their eggs!\n"
     ]
    }
   ],
   "source": [
    "# Using a high top_p\n",
    "output = pipe(messages, do_sample=True, top_p=1)\n",
    "print(output[0][\"generated_text\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "K7oqc_Qj2008"
   },
   "source": [
    "# **Intro to Prompt Engineering**\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "sqjYQBLh26B_"
   },
   "source": [
    "## The Basic Ingredients of a Prompt\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "lnojy54u2uQa"
   },
   "source": [
    "# **Advanced Prompt Engineering**\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "NMF9p8qK58Ou"
   },
   "source": [
    "## Complex Prompt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "uEQ9ZnSH73yP"
   },
   "outputs": [],
   "source": [
    "# Text to summarize which we stole from https://jalammar.github.io/illustrated-transformer/ ;)\n",
    "text = \"\"\"In the previous post, we looked at Attention – a ubiquitous method in modern deep learning models. Attention is a concept that helped improve the performance of neural machine translation applications. In this post, we will look at The Transformer – a model that uses attention to boost the speed with which these models can be trained. The Transformer outperforms the Google Neural Machine Translation model in specific tasks. The biggest benefit, however, comes from how The Transformer lends itself to parallelization. It is in fact Google Cloud’s recommendation to use The Transformer as a reference model to use their Cloud TPU offering. So let’s try to break the model apart and look at how it functions.\n",
    "The Transformer was proposed in the paper Attention is All You Need. A TensorFlow implementation of it is available as a part of the Tensor2Tensor package. Harvard’s NLP group created a guide annotating the paper with PyTorch implementation. In this post, we will attempt to oversimplify things a bit and introduce the concepts one by one to hopefully make it easier to understand to people without in-depth knowledge of the subject matter.\n",
    "Let’s begin by looking at the model as a single black box. In a machine translation application, it would take a sentence in one language, and output its translation in another.\n",
    "Popping open that Optimus Prime goodness, we see an encoding component, a decoding component, and connections between them.\n",
    "The encoding component is a stack of encoders (the paper stacks six of them on top of each other – there’s nothing magical about the number six, one can definitely experiment with other arrangements). The decoding component is a stack of decoders of the same number.\n",
    "The encoders are all identical in structure (yet they do not share weights). Each one is broken down into two sub-layers:\n",
    "The encoder’s inputs first flow through a self-attention layer – a layer that helps the encoder look at other words in the input sentence as it encodes a specific word. We’ll look closer at self-attention later in the post.\n",
    "The outputs of the self-attention layer are fed to a feed-forward neural network. The exact same feed-forward network is independently applied to each position.\n",
    "The decoder has both those layers, but between them is an attention layer that helps the decoder focus on relevant parts of the input sentence (similar what attention does in seq2seq models).\n",
    "Now that we’ve seen the major components of the model, let’s start to look at the various vectors/tensors and how they flow between these components to turn the input of a trained model into an output.\n",
    "As is the case in NLP applications in general, we begin by turning each input word into a vector using an embedding algorithm.\n",
    "Each word is embedded into a vector of size 512. We'll represent those vectors with these simple boxes.\n",
    "The embedding only happens in the bottom-most encoder. The abstraction that is common to all the encoders is that they receive a list of vectors each of the size 512 – In the bottom encoder that would be the word embeddings, but in other encoders, it would be the output of the encoder that’s directly below. The size of this list is hyperparameter we can set – basically it would be the length of the longest sentence in our training dataset.\n",
    "After embedding the words in our input sequence, each of them flows through each of the two layers of the encoder.\n",
    "Here we begin to see one key property of the Transformer, which is that the word in each position flows through its own path in the encoder. There are dependencies between these paths in the self-attention layer. The feed-forward layer does not have those dependencies, however, and thus the various paths can be executed in parallel while flowing through the feed-forward layer.\n",
    "Next, we’ll switch up the example to a shorter sentence and we’ll look at what happens in each sub-layer of the encoder.\n",
    "Now We’re Encoding!\n",
    "As we’ve mentioned already, an encoder receives a list of vectors as input. It processes this list by passing these vectors into a ‘self-attention’ layer, then into a feed-forward neural network, then sends out the output upwards to the next encoder.\n",
    "\"\"\"\n",
    "\n",
    "# Prompt components\n",
    "persona = \"You are an expert in Large Language models. You excel at breaking down complex papers into digestible summaries.\\n\"\n",
    "instruction = \"Summarize the key findings of the paper provided.\\n\"\n",
    "context = \"Your summary should extract the most crucial points that can help researchers quickly understand the most vital information of the paper.\\n\"\n",
    "data_format = \"Create a bullet-point summary that outlines the method. Follow this up with a concise paragraph that encapsulates the main results.\\n\"\n",
    "audience = \"The summary is designed for busy researchers that quickly need to grasp the newest trends in Large Language Models.\\n\"\n",
    "tone = \"The tone should be professional and clear.\\n\"\n",
    "text = \"MY TEXT TO SUMMARIZE\"  # Replace with your own text to summarize\n",
    "data = f\"Text to summarize: {text}\"\n",
    "\n",
    "# The full prompt - remove and add pieces to view its impact on the generated output\n",
    "query = persona + instruction + context + data_format + audience + tone + data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "executionInfo": {
     "elapsed": 2,
     "status": "ok",
     "timestamp": 1715083839583,
     "user": {
      "displayName": "Maarten Grootendorst",
      "userId": "11015108362723620659"
     },
     "user_tz": -120
    },
    "id": "Y8hbXm9R-sr0",
    "outputId": "a4b921a4-909c-4ad8-c3c7-eaf01c9652ef"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<s><|user|>\n",
      "You are an expert in Large Language models. You excel at breaking down complex papers into digestible summaries.\n",
      "Summarize the key findings of the paper provided.\n",
      "Your summary should extract the most crucial points that can help researchers quickly understand the most vital information of the paper.\n",
      "Create a bullet-point summary that outlines the method. Follow this up with a concise paragraph that encapsulates the main results.\n",
      "The summary is designed for busy researchers that quickly need to grasp the newest trends in Large Language Models.\n",
      "The tone should be professional and clear.\n",
      "Text to summarize: MY TEXT TO SUMMARIZE<|end|>\n",
      "<|assistant|>\n",
      "\n"
     ]
    }
   ],
   "source": [
    "messages = [\n",
    "    {\"role\": \"user\", \"content\": query}\n",
    "]\n",
    "print(tokenizer.apply_chat_template(messages, tokenize=False))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "executionInfo": {
     "elapsed": 31865,
     "status": "ok",
     "timestamp": 1715073730373,
     "user": {
      "displayName": "Maarten Grootendorst",
      "userId": "11015108362723620659"
     },
     "user_tz": -120
    },
    "id": "-jO_uqNMTiWv",
    "outputId": "243160b8-c29a-4858-d72e-ac076b5cdc46"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " Key Findings:\n",
      "\n",
      "- The Transformer model utilizes attention mechanisms to improve the speed and performance of deep learning models, particularly in neural machine translation tasks.\n",
      "- It outperforms the Google Neural Machine Translation model in specific tasks and is recommended by Google Cloud for their Cloud TPU offering.\n",
      "- The Transformer model consists of an encoding component and a decoding component, both composed of stacks of identical encoders and decoders.\n",
      "- The encoders use self-attention layers to process input sequences, while the decoders incorporate attention layers to focus on relevant parts of the input.\n",
      "- The model employs embedding algorithms to convert input words into vectors of size 512, which are then processed through the encoder's layers.\n",
      "- The Transformer model's architecture allows for parallelization, enabling faster training and improved performance.\n",
      "\n",
      "Summary:\n",
      "\n",
      "- The Transformer model, introduced in the paper \"Attention is All You Need,\" revolutionizes deep learning models by leveraging attention mechanisms to enhance performance and training speed.\n",
      "- It surpasses the Google Neural Machine Translation model in specific tasks and is recommended by Google Cloud for their Cloud TPU offering.\n",
      "- The model comprises an encoding component and a decoding component, each consisting of stacks of identical encoders and decoders.\n",
      "- Encoders utilize self-attention layers to process input sequences, while decoders incorporate attention layers to focus on relevant input parts.\n",
      "- Embedding algorithms convert input words into vectors of size 512, which are processed through the encoder's layers.\n",
      "- The Transformer model's architecture enables parallelization, resulting in faster training and improved performance.\n",
      "\n",
      "The Transformer model, introduced in the paper \"Attention is All You Need,\" revolutionizes deep learning models by leveraging attention mechanisms to enhance performance and training speed. It surpasses the Google Neural Machine Translation model in specific tasks and is recommended by Google Cloud for their Cloud TPU offering. The model comprises an encoding component and a decoding component, each consisting of stacks of identical encoders and decoders. Encoders utilize self-attention layers to process input sequences, while decoders incorporate attention layers to focus on relevant input parts. Embedding algorithms\n"
     ]
    }
   ],
   "source": [
    "# Generate the output\n",
    "outputs = pipe(messages)\n",
    "print(outputs[0][\"generated_text\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "Ol5aCUfM9bjM"
   },
   "source": [
    "## In-Context Learning: Providing Examples"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "executionInfo": {
     "elapsed": 349,
     "status": "ok",
     "timestamp": 1715085868022,
     "user": {
      "displayName": "Maarten Grootendorst",
      "userId": "11015108362723620659"
     },
     "user_tz": -120
    },
    "id": "Ne92BtrXU8k8",
    "outputId": "ce08f1ae-8fb4-4c39-cb18-8d975f5ced31"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<s><|user|>\n",
      "A 'Gigamuru' is a type of Japanese musical instrument. An example of a sentence that uses the word Gigamuru is:<|end|>\n",
      "<|assistant|>\n",
      "I have a Gigamuru that my uncle gave me as a gift. I love to play it at home.<|end|>\n",
      "<|user|>\n",
      "To 'screeg' something is to swing a sword at it. An example of a sentence that uses the word screeg is:<|end|>\n",
      "<|assistant|>\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# Use a single example of using the made-up word in a sentence\n",
    "one_shot_prompt = [\n",
    "    {\n",
    "        \"role\": \"user\",\n",
    "        \"content\": \"A 'Gigamuru' is a type of Japanese musical instrument. An example of a sentence that uses the word Gigamuru is:\"\n",
    "    },\n",
    "    {\n",
    "        \"role\": \"assistant\",\n",
    "        \"content\": \"I have a Gigamuru that my uncle gave me as a gift. I love to play it at home.\"\n",
    "    },\n",
    "    {\n",
    "        \"role\": \"user\",\n",
    "        \"content\": \"To 'screeg' something is to swing a sword at it. An example of a sentence that uses the word screeg is:\"\n",
    "    }\n",
    "]\n",
    "print(tokenizer.apply_chat_template(one_shot_prompt, tokenize=False))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "executionInfo": {
     "elapsed": 3555,
     "status": "ok",
     "timestamp": 1715085873636,
     "user": {
      "displayName": "Maarten Grootendorst",
      "userId": "11015108362723620659"
     },
     "user_tz": -120
    },
    "id": "Ox3LeHW5JfaD",
    "outputId": "e803e67e-b0c2-4823-d1c9-d1f9bba1642b"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " During the intense duel, the knight skillfully screeged his opponent's shield, forcing him to defend himself.\n"
     ]
    }
   ],
   "source": [
    "# Generate the output\n",
    "outputs = pipe(one_shot_prompt)\n",
    "print(outputs[0][\"generated_text\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "cugB9temhHJ0"
   },
   "source": [
    "## Chain Prompting: Breaking up the Problem\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "executionInfo": {
     "elapsed": 3336,
     "status": "ok",
     "timestamp": 1715086284771,
     "user": {
      "displayName": "Maarten Grootendorst",
      "userId": "11015108362723620659"
     },
     "user_tz": -120
    },
    "id": "VXYuFn9eG2t4",
    "outputId": "3f96c196-8d37-44ca-fad5-94ddcbe8fc57"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " Name: \"MindMeld Messenger\"\n",
      "\n",
      "Slogan: \"Unleashing Intelligent Conversations, One Response at a Time\"\n"
     ]
    }
   ],
   "source": [
    "# Create name and slogan for a product\n",
    "product_prompt = [\n",
    "    {\"role\": \"user\", \"content\": \"Create a name and slogan for a chatbot that leverages LLMs.\"}\n",
    "]\n",
    "outputs = pipe(product_prompt)\n",
    "product_description = outputs[0][\"generated_text\"]\n",
    "print(product_description)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "executionInfo": {
     "elapsed": 8805,
     "status": "ok",
     "timestamp": 1715086322608,
     "user": {
      "displayName": "Maarten Grootendorst",
      "userId": "11015108362723620659"
     },
     "user_tz": -120
    },
    "id": "fNYi3eDRG9Sk",
    "outputId": "0e1f5a80-27a4-42ef-b514-36e54dea708c"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " Introducing MindMeld Messenger - your ultimate communication partner! Unleash intelligent conversations with our innovative AI-powered messaging platform. With MindMeld Messenger, every response is thoughtful, personalized, and timely. Say goodbye to generic replies and hello to meaningful interactions. Elevate your communication game with MindMeld Messenger - where every message is a step towards smarter conversations. Try it now and experience the future of messaging!\n"
     ]
    }
   ],
   "source": [
    "# Based on a name and slogan for a product, generate a sales pitch\n",
    "sales_prompt = [\n",
    "    {\"role\": \"user\", \"content\": f\"Generate a very short sales pitch for the following product: '{product_description}'\"}\n",
    "]\n",
    "outputs = pipe(sales_prompt)\n",
    "sales_pitch = outputs[0][\"generated_text\"]\n",
    "print(sales_pitch)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "IL1UpiV02V5O"
   },
   "source": [
    "# **Reasoning with Generative Models**\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "6lQJhAnY2W3O"
   },
   "source": [
    "## Chain-of-Thought: Think Before Answering\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "executionInfo": {
     "elapsed": 6695,
     "status": "ok",
     "timestamp": 1715074858205,
     "user": {
      "displayName": "Maarten Grootendorst",
      "userId": "11015108362723620659"
     },
     "user_tz": -120
    },
    "id": "TnkxKmLyHGuR",
    "outputId": "3839ac23-9860-49b3-f769-20610786590a"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " The cafeteria started with 25 apples. They used 20 apples to make lunch, so they had:\n",
      "\n",
      "25 - 20 = 5 apples left.\n",
      "\n",
      "Then they bought 6 more apples, so they now have:\n",
      "\n",
      "5 + 6 = 11 apples.\n"
     ]
    }
   ],
   "source": [
    "# # Answering without explicit reasoning\n",
    "# standard_prompt = [\n",
    "#     {\"role\": \"user\", \"content\": \"Roger has 5 tennis balls. He buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis balls does he have now?\"},\n",
    "#     {\"role\": \"assistant\", \"content\": \"11\"},\n",
    "#     {\"role\": \"user\", \"content\": \"The cafeteria had 25 apples. If they used 20 to make lunch and bought 6 more, how many apples do they have?\"}\n",
    "# ]\n",
    "\n",
    "# # Run generative model\n",
    "# outputs = pipe(standard_prompt)\n",
    "# print(outputs[0][\"generated_text\"])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "executionInfo": {
     "elapsed": 5409,
     "status": "ok",
     "timestamp": 1715086607857,
     "user": {
      "displayName": "Maarten Grootendorst",
      "userId": "11015108362723620659"
     },
     "user_tz": -120
    },
    "id": "X9jD3zPPEz_X",
    "outputId": "1b5a3d0f-1812-4299-eb95-11f9df48f7e7"
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " The cafeteria started with 23 apples. They used 20 apples, so they had 23 - 20 = 3 apples left. Then they bought 6 more apples, so they now have 3 + 6 = 9 apples. The answer is 9.\n"
     ]
    }
   ],
   "source": [
    "# Answering with chain-of-thought\n",
    "cot_prompt = [\n",
    "    {\"role\": \"user\", \"content\": \"Roger has 5 tennis balls. He buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis balls does he have now?\"},\n",
    "    {\"role\": \"assistant\", \"content\": \"Roger started with 5 balls. 2 cans of 3 tennis balls each is 6 tennis balls. 5 + 6 = 11. The answer is 11.\"},\n",
    "    {\"role\": \"user\", \"content\": \"The cafeteria had 23 apples. If they used 20 to make lunch and bought 6 more, how many apples do they have?\"}\n",
    "]\n",
    "\n",
    "# Generate the output\n",
    "outputs = pipe(cot_prompt)\n",
    "print(outputs[0][\"generated_text\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "4jVOOib72Z0P"
   },
   "source": [
    "## Zero-shot Chain-of-Thought\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "executionInfo": {
     "elapsed": 5802,
     "status": "ok",
     "timestamp": 1715086681529,
     "user": {
      "displayName": "Maarten Grootendorst",
      "userId": "11015108362723620659"
     },
     "user_tz": -120
    },
    "id": "iyHWAk2XKKGt",
    "outputId": "c6baf07b-5b72-4997-8499-aa8015303abd"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " Step 1: Start with the initial number of apples, which is 23.\n",
      "Step 2: Subtract the number of apples used to make lunch, which is 20. So, 23 - 20 = 3 apples remaining.\n",
      "Step 3: Add the number of apples bought, which is 6. So, 3 + 6 = 9 apples.\n",
      "\n",
      "The cafeteria now has 9 apples.\n"
     ]
    }
   ],
   "source": [
    "# Zero-shot Chain-of-Thought\n",
    "zeroshot_cot_prompt = [\n",
    "    {\"role\": \"user\", \"content\": \"The cafeteria had 23 apples. If they used 20 to make lunch and bought 6 more, how many apples do they have? Let's think step-by-step.\"}\n",
    "]\n",
    "\n",
    "# Generate the output\n",
    "outputs = pipe(zeroshot_cot_prompt)\n",
    "print(outputs[0][\"generated_text\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "-3-bMVLT2ehH"
   },
   "source": [
    "## Tree-of-Thought: Exploring Intermediate Steps\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "3ZJo-4NBI1V-"
   },
   "outputs": [],
   "source": [
    "# Zero-shot Chain-of-Thought\n",
    "zeroshot_tot_prompt = [\n",
    "    {\"role\": \"user\", \"content\": \"Imagine three different experts are answering this question. All experts will write down 1 step of their thinking, then share it with the group. Then all experts will go on to the next step, etc. If any expert realises they're wrong at any point then they leave. The question is 'The cafeteria had 23 apples. If they used 20 to make lunch and bought 6 more, how many apples do they have?' Make sure to discuss the results.\"}\n",
    "]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "executionInfo": {
     "elapsed": 17866,
     "status": "ok",
     "timestamp": 1715086782469,
     "user": {
      "displayName": "Maarten Grootendorst",
      "userId": "11015108362723620659"
     },
     "user_tz": -120
    },
    "id": "qieQ9waXLbZO",
    "outputId": "6cfb9a7f-7b8f-4b0b-ac78-5aa4206175b8"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " Expert 1: Step 1 - Start with the initial number of apples: 23 apples.\n",
      "\n",
      "Expert 2: Step 1 - Subtract the apples used for lunch: 23 - 20 = 3 apples remaining.\n",
      "\n",
      "Expert 3: Step 1 - Add the newly bought apples: 3 + 6 = 9 apples.\n",
      "\n",
      "\n",
      "Expert 1: Step 2 - Confirm the final count: The cafeteria has 9 apples.\n",
      "\n",
      "Expert 2: Step 2 - Review the calculations: 23 - 20 = 3, then 3 + 6 = 9. The calculations are correct.\n",
      "\n",
      "Expert 3: Step 2 - Agree with the result: The cafeteria indeed has 9 apples.\n",
      "\n",
      "\n",
      "All experts agree on the final count: The cafeteria has 9 apples.\n"
     ]
    }
   ],
   "source": [
    "# Generate the output\n",
    "outputs = pipe(zeroshot_tot_prompt)\n",
    "print(outputs[0][\"generated_text\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "TyCw3A5GZ61D"
   },
   "source": [
    "# **Output Verification**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "pi67V-xNZ8fS"
   },
   "source": [
    "## Providing Examples"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "executionInfo": {
     "elapsed": 24562,
     "status": "ok",
     "timestamp": 1715087067979,
     "user": {
      "displayName": "Maarten Grootendorst",
      "userId": "11015108362723620659"
     },
     "user_tz": -120
    },
    "id": "NBc_HuJQY8sh",
    "outputId": "818a31a8-7dd6-4d70-feec-5db827b491d3"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " {\n",
      "  \"character_profile\": {\n",
      "    \"name\": \"Eldrin Stormbringer\",\n",
      "    \"race\": \"Human\",\n",
      "    \"class\": \"Warlock\",\n",
      "    \"level\": 5,\n",
      "    \"alignment\": \"Chaotic Good\",\n",
      "    \"attributes\": {\n",
      "      \"strength\": 8,\n",
      "      \"dexterity\": 14,\n",
      "      \"constitution\": 10,\n",
      "      \"intelligence\": 12,\n",
      "      \"wisdom\": 10,\n",
      "      \"charisma\": 16\n",
      "    },\n",
      "    \"skills\": [\n",
      "      {\n",
      "        \"name\": \"Fireball\",\n",
      "        \"proficiency\": 18\n",
      "      },\n",
      "      {\n",
      "        \"name\": \"Shadowstep\",\n",
      "        \"proficiency\": 16\n",
      "      },\n",
      "      {\n",
      "        \"name\": \"Charm Person\",\n",
      "        \"proficiency\": 14\n",
      "      }\n",
      "    ],\n",
      "    \"equipment\": [\n",
      "      {\n",
      "        \"name\": \"Stormbringer Staff\",\n",
      "        \"type\": \"Magic Weapon\",\n",
      "        \"damage\": 15\n",
      "      },\n",
      "      {\n",
      "        \"name\": \"Leather Armor\",\n",
      "        \"type\": \"Light Armor\",\n",
      "        \"defense\": 10\n",
      "      },\n",
      "      {\n",
      "        \"name\": \"Ring of Protection\",\n",
      "        \"type\": \"Magic Armor\",\n",
      "        \"defense\": 5\n",
      "      }\n",
      "    ],\n",
      "    \"background\": \"Eldrin grew up in a small village, where he was known for his mischievous nature and knack for getting into trouble. However, he discovered his affinity for magic and quickly became a skilled warlock, using his powers to protect his home and those he cares about.\"\n",
      "  }\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "# Zero-shot learning: Providing no examples\n",
    "zeroshot_prompt = [\n",
    "    {\"role\": \"user\", \"content\": \"Create a character profile for an RPG game in JSON format.\"}\n",
    "]\n",
    "\n",
    "# Generate the output\n",
    "outputs = pipe(zeroshot_prompt)\n",
    "print(outputs[0][\"generated_text\"])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "executionInfo": {
     "elapsed": 3915,
     "status": "ok",
     "timestamp": 1715087091714,
     "user": {
      "displayName": "Maarten Grootendorst",
      "userId": "11015108362723620659"
     },
     "user_tz": -120
    },
    "id": "4cc1U9TKZJkM",
    "outputId": "95cc264e-95a7-48e7-bef8-59d1bcef9ea9"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " {\n",
      "  \"description\": \"A cunning rogue with a mysterious past, skilled in stealth and deception.\",\n",
      "  \"name\": \"Lysandra Shadowstep\",\n",
      "  \"armor\": \"Leather Cloak of the Night\",\n",
      "  \"weapon\": \"Dagger of Whispers, Throwing Knives\"\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "# One-shot learning: Providing an example of the output structure\n",
    "one_shot_template = \"\"\"Create a short character profile for an RPG game. Make sure to only use this format:\n",
    "\n",
    "{\n",
    "  \"description\": \"A SHORT DESCRIPTION\",\n",
    "  \"name\": \"THE CHARACTER'S NAME\",\n",
    "  \"armor\": \"ONE PIECE OF ARMOR\",\n",
    "  \"weapon\": \"ONE OR MORE WEAPONS\"\n",
    "}\n",
    "\"\"\"\n",
    "one_shot_prompt = [\n",
    "    {\"role\": \"user\", \"content\": one_shot_template}\n",
    "]\n",
    "\n",
    "# Generate the output\n",
    "outputs = pipe(one_shot_prompt)\n",
    "print(outputs[0][\"generated_text\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "jR3QSswQ2kAq"
   },
   "source": [
    "## Grammar: Constrained Sampling\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "L06f1CYoc9Pp"
   },
   "outputs": [],
   "source": [
    "import gc\n",
    "import torch\n",
    "del model, tokenizer, pipe\n",
    "\n",
    "# Flush memory\n",
    "gc.collect()\n",
    "torch.cuda.empty_cache()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "executionInfo": {
     "elapsed": 6644,
     "status": "ok",
     "timestamp": 1715087512875,
     "user": {
      "displayName": "Maarten Grootendorst",
      "userId": "11015108362723620659"
     },
     "user_tz": -120
    },
    "id": "B-YykcIru20X",
    "outputId": "5754cfa9-ca0b-43bf-9e28-995ad18eaf0c"
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_token.py:89: UserWarning: \n",
      "The secret `HF_TOKEN` does not exist in your Colab secrets.\n",
      "To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.\n",
      "You will be able to reuse this secret in all of your notebooks.\n",
      "Please note that authentication is recommended but still optional to access public models or datasets.\n",
      "  warnings.warn(\n"
     ]
    }
   ],
   "source": [
    "from llama_cpp.llama import Llama\n",
    "\n",
    "# Load Phi-3\n",
    "llm = Llama.from_pretrained(\n",
    "    repo_id=\"microsoft/Phi-3-mini-4k-instruct-gguf\",\n",
    "    filename=\"*fp16.gguf\",\n",
    "    n_gpu_layers=-1,\n",
    "    n_ctx=2048,\n",
    "    verbose=False\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "_j9TYBVIgti6"
   },
   "outputs": [],
   "source": [
    "# Generate output\n",
    "output = llm.create_chat_completion(\n",
    "    messages=[\n",
    "        {\"role\": \"user\", \"content\": \"Create a warrior for an RPG in JSON format.\"},\n",
    "    ],\n",
    "    response_format={\"type\": \"json_object\"},\n",
    "    temperature=0,\n",
    ")['choices'][0]['message'][\"content\"]\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "executionInfo": {
     "elapsed": 16,
     "status": "ok",
     "timestamp": 1715087538556,
     "user": {
      "displayName": "Maarten Grootendorst",
      "userId": "11015108362723620659"
     },
     "user_tz": -120
    },
    "id": "Kp7FBjFdsbHe",
    "outputId": "6f7ca54b-aca2-46c9-a04d-278baebce333"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{\n",
      "    \"name\": \"Eldrin Stormbringer\",\n",
      "    \"class\": \"Ranger\",\n",
      "    \"level\": 5,\n",
      "    \"attributes\": {\n",
      "        \"strength\": 14,\n",
      "        \"dexterity\": 18,\n",
      "        \"constitution\": 12,\n",
      "        \"intelligence\": 10,\n",
      "        \"wisdom\": 13,\n",
      "        \"charisma\": 9\n",
      "    },\n",
      "    \"skills\": {\n",
      "        \"archery\": {\n",
      "            \"proficiency\": 20,\n",
      "            \"critical_hit_chance\": 5,\n",
      "            \"damage_range\": [\n",
      "                8,\n",
      "                14\n",
      "            ]\n",
      "        },\n",
      "        \"stealth\": {\n",
      "            \"proficiency\": 17,\n",
      "            \"critical_hit_chance\": 3,\n",
      "            \"damage_range\": [\n",
      "                2,\n",
      "                6\n",
      "            ]\n",
      "        },\n",
      "        \"nature_magic\": {\n",
      "            \"proficiency\": 15,\n",
      "            \"critical_hit_chance\": 4,\n",
      "            \"healing_range\": [\n",
      "                3,\n",
      "                7\n",
      "            ],\n",
      "            \"damage_range\": [\n",
      "                -2,\n",
      "                2\n",
      "            ]\n",
      "        }\n",
      "    },\n",
      "    \"equipment\": {\n",
      "        \"weapons\": [\n",
      "            \"Longbow\",\n",
      "            \"Dagger\"\n",
      "        ],\n",
      "        \"armor\": \"Leather Armor\",\n",
      "        \"accessories\": [\n",
      "            \"Boots of Speed\",\n",
      "            \"Ring of Protection\"\n",
      "        ]\n",
      "    },\n",
      "    \"background\": \"Eldrin grew up in the wilderness, learning to hunt and track from his father. He became a skilled ranger after joining a group of adventurers.\"\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "import json\n",
    "\n",
    "# Format as json\n",
    "json_output = json.dumps(json.loads(output), indent=4)\n",
    "print(json_output)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "kxf4Xexhtel_"
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "accelerator": "GPU",
  "colab": {
   "authorship_tag": "ABX9TyMRJzDTVTlBntcudlyKW5VZ",
   "gpuType": "T4",
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
